Git HTTPS: https://github.com/ShubhP96/R-project.git

Data < - https://drive.google.com/file/d/1qoRhkTT8ObPngfst2kPA_-p0dlOSbWps/view?usp=share_link

Motivation and Overview:

Supply chain management is the coordination and management of the various activities and processes involved in the production and delivery of goods and services, from the initial raw material sourcing to the final delivery of the finished product to the customer. The supply chain is a complex network of entities, activities, information, and resources involved in the production and delivery of goods and services to customers. It spans across multiple stages, including planning, sourcing, manufacturing, logistics, and customer service, and involves numerous stakeholders, such as suppliers, manufacturers, distributors, retailers and customers. Managing the supply chain effectively is critical for achieving customer satisfaction, minimizing costs, and improving efficiency and profitability.

Supply chain data analysis involves the collection, processing, and interpretation of data from various sources within the supply chain, including internal systems, customer data, and external sources such as market data and industry trends. The goal is to extract insights and knowledge that can help improve supply chain operations, optimize resource allocation, and enhance customer experience.

The goal of this project is to analyze supply chain data to gain insights and make informed decisions. The motivation behind this project is to help businesses optimize their supply chain operations and improve their overall performance. By analyzing various aspects of the supply chain such as sales, inventory, delivery times, and customer feedback, we can identify areas for improvement and implement effective strategies to enhance the efficiency and profitability of the supply chain.

We also decided to create a Shiny app because it’s a powerful way to communicate the insights and findings of data analysis to a broad audience in an interactive and engaging way. Another motivation for creating a Shiny app is to provide a way for non-technical stakeholders to easily access and understand the results of a data analysis project, or to enable data analysts to explore and interact with their data in new ways. By creating a Shiny app, users can easily visualize and explore data, and make informed decisions based on the insights gained from the analysis. Additionally, Shiny apps provide a way to share the results of a data analysis project with a wider audience, facilitating collaboration and decision-making based on the insights gleaned from the data.In our shiny app we have added 5 pages.

The first page named Data Visualization contents droup-down name of 6 plots and on the left side and the plots on the right side. The second page named Sales and Profit Analysis contents three droup-down in the first droup-down we can select sales or profit in second droup-down we can select year from 2015 to 2018 in third droup-down we can select month or quarter and it will generate plot according to that info.The third page contents Data Tables which have the option to select every column plots and a slider to adjust the range from 0 to 500. The last fifth page named map contents heat maps generated my three droup-downs (Latitude, Longitude and Sales)

The target audience for this project would be as follows:

● Data scientists, machine learning engineers, and AI developers who are interested in developing algorithms, models, and systems for analyzing and optimizing the supply chain.

● Business managers, executives, and analysts who work in the supply chain management industry.Researchers and academics in the fields of supply chain management, logistics, operations management, and data analytics.

● Students and educators who are interested in learning about supply chain management and data analytics. These individuals may be interested in using the dataset for educational purposes, such as developing case studies, conducting research projects, or teaching courses on supply chain management and data analytics.

Related Work:

We drew inspiration from various sources including academic papers, online resources, and discussions in class. This are some of the papers and resources we refereed and found helpful for our project are as follows:

“Supply Chain Management: An Analytical Framework for Critical Literature Review” by M. S. Khan and M. H. Kumar, “Machine Learning in Supply Chain Management: A Comprehensive Overview” by C. M. S. Sá et al. (2020), “Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities” by Mahya Seyedan & Fereshteh Mafakheri, “The Impact of Digitalization on Supply Chain Management: A Conceptual Framework and Future Research Directions” by M. M. Hassan et al. (2020). “Big Data Analytics in Supply Chain Management: A State-of-the-Art Literature Review” by K. Subramanian et al. (2019). Supply Chain Digest - https://www.scdigest.com/

This papers and website provided a comprehensive overview of the different aspects of supply chain management and the various analyatical techniques used in the field. We also looked at real-world examples of successful supply chain management, such as Amazon and Walmart, to gain insights into their strategies and practices.

Initial Questions:

At the start of the project, we had several questions in mind, such as:

Q.What are the major factors that affect supply chain performance?

Q.How does sales vary across different product categories?

Q.What is the percentage of late deliveries, and what factors contribute to late deliveries?

Q.How can we optimize inventory management to reduce costs and improve customer satisfaction?

Q.What is the impact of delivery times on customer satisfaction?

Q.How can we leverage customer feedback to improve supply chain operations?

Q.What is the relationship between the shipping mode and late delivery risk?

As we progressed with the project and analyzed the data, we refined our questions and considered new ones. For example, we started looking at regional variations in supply chain performance and the impact of seasonal trends on sales and inventory management.

Data:

A data set of Supply Chains used by the company DataCo Global is being used for the analysis in this. This allows the use of Machine Learning Algorithms and R Software. The data set has overall 180519 rows and 53 columns with valuable information related to sales of different goods in various regions dated from 2015 JAN till 2018 JAN. We also removed any rows with missing data. The final dataset contained 44 columns.

The dataset used in this project is the “DataCoSupplyChainDataset.csv” which contains information on various aspects of the supply chain including orders, products, customers, shipping, and feedback. The dataset was imported into R using the read.csv() function and was prepossessed to convert the order date to a date format using the as.Date() function. We also used various functions from the dplyr and lubridate packages for data wrangling and manipulation.

Exploratory Data Analysis:

We used various visualizations such as histograms, scatterplots, and boxplots to gain insights into the data and identify patterns and trends. These visualizations helped us understand the distribution of sales per customer, delivery status distribution, late delivery risk, and sales by product category. We also used summary statistics to gain insights into numerical variables such as the number of days for shipping and benefit per order. For example, we plotted the sales and profit trends over time to see how they vary by month and year. We also used bar plots to compare sales and profit by product category and region. Based on these visualizations, we decided to focus on analyzing sales and profit by month and quarter.

Data Analysis:

To analyze the data, we used various statistical and computational methods to analyze the dataset such as regression analysis, including data wrangling, data visualization, and descriptive statistics. We used regression analysis to model the relationship between sales and various factors such as product category, region, and marketing expenses and ANOVA to compare the sales and profit across different quarters and years. We also used time series analysis to forecast sales and inventory levels based on historical trends and linear regression to analyze the relationship between the shipping mode and late delivery risk.

Narrative and Summary:

From our analysis, we found that the major factors affecting supply chain performance include inventory management, delivery times, and customer feedback. By optimizing these factors, we can improve the efficiency and profitability of the supply chain. We also found that sales and profit vary significantly by product category and region, highlighting the need for targeted marketing and inventory management strategies. However, there are limitations to our analysis such as the lack of information on competitor activities and external factors such as economic conditions. Nonetheless, our analysis provides valuable insights into the supply chain operations and can be used to inform decision-making in businesses.

In conclusion, we learned that the company’s sales are mainly driven by a few categories, including Technology and Office Supplies. We also found that the delivery status distribution varied significantly across different product categories. Additionally, we found that the percentage of late deliveries was relatively small but could be attributed to factors such as the product category, shipping mode, and order region. Finally, we found that the shipping mode has a significant impact on the late delivery risk, with Standard Class being associated with a higher risk of late deliveries compared to other shipping modes. However, the analysis has several limitations, including the lack of data on customer satisfaction, the absence of data on competitors, and the limited number of variables.

# Loading necessary library files
library(shiny)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(lubridate)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(leaflet)
library(leaflet.extras)
data <- read.csv("DataCoSupplyChainDataset.csv")
data <- subset(data, select = -c(Customer.Email, Customer.Password, Order.Item.Cardprod.Id, Order.Zipcode, Product.Card.Id, Product.Description, Product.Image))

data <- na.omit(data)
# Display summary statistics for numerical variables
summary(data[, c("Days.for.shipping..real.","Days.for.shipment..scheduled.","Benefit.per.order","Sales.per.customer")])
##  Days.for.shipping..real. Days.for.shipment..scheduled. Benefit.per.order 
##  Min.   :0.000            Min.   :0.000                 Min.   :-4274.98  
##  1st Qu.:2.000            1st Qu.:2.000                 1st Qu.:    7.00  
##  Median :3.000            Median :4.000                 Median :   31.52  
##  Mean   :3.498            Mean   :2.932                 Mean   :   21.98  
##  3rd Qu.:5.000            3rd Qu.:4.000                 3rd Qu.:   64.80  
##  Max.   :6.000            Max.   :4.000                 Max.   :  911.80  
##  Sales.per.customer
##  Min.   :   7.49   
##  1st Qu.: 104.38   
##  Median : 163.99   
##  Mean   : 183.11   
##  3rd Qu.: 247.40   
##  Max.   :1939.99
sales_histogram <- function(data) {
  ggplot(data, aes(x = Customer.Id, fill = after_stat(x))) +
    geom_histogram() +
    scale_fill_gradient(low = "blue", high = "red") +
    labs(x = "Sales per customer", y = "Count", title = "Distribution of Sales per Customer")
}

sales_histogram(data)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Function to create histogram of delivery status distribution
create_delivery_status_histogram <- function(data) {
  ggplot(data, aes(x = Delivery.Status, fill = Delivery.Status)) +
    geom_bar() +
    labs(x = "Delivery Status", y = "Count", title = "Delivery Status Distribution") +
    scale_fill_manual(values = c("green", "blue", "red", "orange", "purple")) # Set custom colors
}

create_delivery_status_histogram(data)

late_delivery_category_plot <- function(data) {
  # Filter data based on "Late delivery" status
  late_delivery <- data %>%
    filter(Delivery.Status == "Late delivery")
  
  # Count the number of late deliveries by Category Name
  category_counts <- late_delivery %>%
    group_by(Category.Name) %>%
    summarise(count = n()) %>%
    arrange(desc(count)) %>%
    top_n(10)
  
  # Plot the top 10 categories by count of late deliveries
  ggplot(category_counts, aes(x = Category.Name, y = count, fill = Category.Name)) +
    geom_col() +
    labs(x = "Category Name", y = "Count of Late Deliveries", title = "Top 10 Categories by Late Delivery Count") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
late_delivery_category_plot(data)
## Selecting by count

plot_late_delivery_by_country <- function(data) {
  # Filter data based on "Late delivery" status
  late_delivery <- data %>%
    filter(Delivery.Status == "Late delivery")

  # Count the number of late deliveries by Order.Country
  country_counts <- late_delivery %>%
    group_by(Order.Country) %>%
    summarise(count = n()) %>%
    arrange(desc(count)) %>%
    top_n(10)

  # Plot the top 10 countries by count of late deliveries
  ggplot(country_counts, aes(x = Order.Country, y = count, fill = Order.Country)) +
    geom_col() +
    labs(x = "Country", y = "Count of Late Deliveries", title = "Top 10 Countries by Late Delivery Count") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
plot_late_delivery_by_country(data)
## Selecting by count

# Function to create boxplot of sales by category name
create_sales_boxplot <- function(data) {
  ggplot(data, aes(x = Category.Name, y = Sales, fill = Category.Name)) +
    geom_boxplot() +
    labs(title = "Category Name vs. Sales")
}
create_sales_boxplot(data)

# Function to create boxplot of late delivery risk by shipping mode
create_delivery_boxplot <- function(data) {
  ggplot(data, aes(x = Shipping.Mode, y = Late_delivery_risk, fill = Shipping.Mode)) +
    geom_boxplot() +
    labs(title = "Shipping Mode vs. Late Delivery Risk")
}
create_delivery_boxplot(data)

# Highest revenue based on Department.

plot_sales_by_department <- function(data) {
  # Aggregate sales by department
  sales_by_department <- data %>%
    group_by(Department.Name) %>%
    summarize(total_sales = sum(Sales))
  
  # Create the plot
  p <- ggplot(sales_by_department, aes(x = Department.Name, y = total_sales, fill = Department.Name, 
                                        text = paste("Department: ", Department.Name, "<br>", "Sales: $", 
                                                     scales::comma(total_sales)))) +
    geom_bar(stat = "identity") +
    labs(title = "Total Sales by Department", x = "Department", y = "Sales") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
  
  # Convert the plot to Plotly
  ggplotly(p)
}
plot_sales_by_department(data)
top_10_orders_by_country <- function(data) {
  # calculate the count of orders by Order.Country
  orders_by_country <- data %>%
    group_by(Order.Country) %>%
    summarize(count = n()) %>%
    arrange(desc(count)) %>%
    slice(1:10)

  # create the plot 
  p <- ggplot(orders_by_country, aes(x = Order.Country, y = count, fill = Order.Country, text = paste("Country: ", Order.Country, "<br>", "Orders: ", count))) +
    geom_bar(stat = "identity") +
    labs(title = "Top 10 Order Countries by Customer Orders") +
    xlab("Order Country") +
    ylab("Customer Orders") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

  # convert the plot to plotly
  ggplotly(p)
}
top_10_orders_by_country(data)
plot_order_region_count <- function(data) {
  
  # create a data frame with the count of orders by order region
  order_region_count <- data %>%
    group_by(Order.Region) %>%
    summarise(Count = n()) %>%
    arrange(desc(Count)) 
  
  # create the plot
  plot <- ggplot(order_region_count, aes(x = Order.Region, y = Count, fill = Order.Region, text = paste("Order Region: ", Order.Region, "<br>", "Order Count: ", Count))) +
    geom_bar(stat = "identity") +
    labs(title = "Order Regions by Count of Orders from Customers") +
    xlab("Order Region") +
    ylab("Order Count") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
    scale_fill_viridis_d()
  
  # convert the plot to plotly
  ggplotly(plot)
}
plot_order_region_count(data)
# get the top 20 customers who did the highest sales.

top_customers_sales_plot <- function(data) {
  
  # create a data frame with sales per customer
  sales_per_customer <- data %>%
    group_by(Customer.Id) %>%
    summarise(total_sales = sum(Sales)) %>%
    arrange(desc(total_sales))
  
  # get the top 20 customers who did the highest sales.
  top_customers <- head(sales_per_customer, 20)
  
  # create a new column with the combined first and last name of each customer
  top_customers <- top_customers %>%
    left_join(data %>% select(Customer.Id, Customer.Fname, Customer.Lname), by = "Customer.Id") %>%
    mutate(CustomerName = paste(Customer.Fname, Customer.Lname, sep = " "))
  
  # create the plot
  plot <- ggplot(top_customers, aes(x = reorder(CustomerName, -total_sales), y = total_sales, fill = CustomerName, text = paste("Customer Name: ", CustomerName, "<br>", "Total Sales: $", total_sales))) +
    geom_bar(stat = "identity") +
    labs(title = "Top_20_Customers") +
    xlab("Customer Name") +
    ylab("Total Sales") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
  
  # convert the plot to plotly
  ggplotly(plot)
}
top_customers_sales_plot(data)
generate_sales_profit_plot <- function(data) {
  
  # Convert order date to date format
  data$order.date..DateOrders. <- as.Date(data$order.date..DateOrders., format = "%m/%d/%Y %H:%M")
  
  # Create data for year dropdown menu
  years <- data %>%
    group_by(year = lubridate::year(order.date..DateOrders.)) %>%
    summarise(total_sales = sum(Sales), total_profit = sum(Order.Profit.Per.Order))
  
  # Create data for month dropdown menu
  months <- data %>%
    group_by(month = lubridate::month(order.date..DateOrders., label = TRUE),
             year = lubridate::year(order.date..DateOrders.)) %>%
    summarise(total_sales = sum(Sales), total_profit = sum(Order.Profit.Per.Order))
  
  # Create data for quarter dropdown menu
  quarters <- data %>%
    group_by(quarter = quarters(order.date..DateOrders.),
             year = lubridate::year(order.date..DateOrders.)) %>%
    summarise(total_sales = sum(Sales), total_profit = sum(Order.Profit.Per.Order))
  
  # Create sales and profit plot
  generate_plot <- function(input_sales_profit, input_year, input_month_quarter) {
    filtered_data <- if (input_month_quarter == "month") {
      months %>%
        filter(year == input_year)
    } else {
      quarters %>%
        filter(year == input_year)
    }
    if (input_sales_profit == "Sales") {
      ggplot(filtered_data, aes(x = get(input_month_quarter), y = total_sales, fill = get(input_month_quarter))) +
        geom_bar(stat = "identity") +
        scale_fill_discrete(name = input_month_quarter) +
        labs(title = paste("Total Sales by", input_month_quarter, "in", input_year), 
             x = input_month_quarter, y = "Sales")
    } else {
      ggplot(filtered_data, aes(x = get(input_month_quarter), y = total_profit, fill = get(input_month_quarter))) +
        geom_bar(stat = "identity") +
        scale_fill_discrete(name = input_month_quarter) +
        labs(title = paste("Total Profit by", input_month_quarter, "in", input_year), 
             x = input_month_quarter, y = "Profit")
    }
  }
  
  return(generate_plot)
}
sales_profit_plot <- generate_sales_profit_plot(data)
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'quarter'. You can override using the
## `.groups` argument.
# Generate sales plot for year 2016  monthly and quarterly 
sales_profit_plot("Sales", 2015, "month")

sales_profit_plot("Sales", 2015, "quarter")

# Generate sales plot for year 2016  monthly and quarterly 
sales_profit_plot("Sales", 2016, "month")

sales_profit_plot("Sales", 2016, "quarter")

# Generate sales plot for year 2017  monthly and quarterly 
sales_profit_plot("Sales", 2017, "month")

sales_profit_plot("Sales", 2017, "quarter")

# Generate sales plot for year 2016  monthly and quarterly 
sales_profit_plot("Sales", 2018, "month")

sales_profit_plot("Sales", 2018, "quarter")